
# Predicting and Preventing Gun Violence: An Experimental Evaluation of READI Chicago


## Overview

This package contains the replication code for each analysis, table, and figure included in the paper. The majority of the code is written in R, with a handful of analyses performed in Stata. This package relies on an analysis file of the following structure: one row per study member; columns for treatment status, take-up information, baseline measures (pre-randomization), and outcome measures (post-randomization). 


## Data availability

Please note that the paper uses confidential individual-level administrative data from the Chicago Police Department, Cook County Sheriff’s Office, Illinois Department of Corrections, and Heartland Alliance. We are legally prohibited from releasing or sharing these datasets due to the data use agreements in place. To support transparency and replication, however, we are happy to connect interested researchers with our contacts at these agencies to assist with requests for the data.


## Dataset list

The raw datasets used to create the analysis file referenced in this replication code are listed below. For more information, see Appendix A.2 of the paper.

### Arrests and victimizations

> Source: Chicago Police Department

These datasets were used to generate baseline covariates and future arrest and victimization outcome measures. 

| Data set                | Notes                                                                             | Provided |
|-------------------------|-----------------------------------------------------------------------------------|----------|
| Arrests                 | Offender and charge information for arrests from 1999 to present                  | No       |
| Victimizations          | Victim and incident information for reported crimes from 1999 to present          | No       |
| Shooting victimizations | Victim and incident information for reported shooting crimes from 2010 to present | No       |
| Homicide victimizations | Victim and incident information for reported homicides from 1982 to present       | No       |


### Incarceration (jail and prison)

> Source: Cook County Sheriff's Office (CCSO) and Illinois Department of Corrections (IDOC), respectively

These datasets were used to generate baseline covariates and measures of incapacitation during the outcome period.

| Data set            | Notes                                                                                | Provided |
|---------------------|--------------------------------------------------------------------------------------|----------|
| CCSO jail records   | Daily detainees from 2015 to present                                                 | No       |
| IDOC prison records | Entry and exit dates for detainees in state carceral facilities from 1999 to present | No       |


### Program participation 

> Source: READI providers (e.g. Heartland Alliance)

These datasets were used to measure program takeup and dosage. 

| Data set     | Notes                                                      | Provided |
|--------------|------------------------------------------------------------|----------|
| Headcount    | Program attendance data from Heartland Alliance            | No       |
| Employment   | Job attendance data from READI employment providers        | No       |
| Payroll      | Wage data from Heartland Alliance (the employer of record) | No       |



## Description of programs/code

The code is partitioned into seven top-level folders that can be classified as either analysis or table/figure creation: 

Analysis

* `get_realized_risk`: Computes post-randomization outcome rates for control group members.
* `run_regressions`: Loops through the regression specifications defined in `run_regressions/hand/config.yaml` and runs regressions according to the stated parameters. Covariate lists, groups of outcome measures, subgroups, and regression samples are defined in various configuration files also found in `run_regressions/hand/`.
* `run_benefit_cost_regressions`: Runs the regressions associated with the benefit cost analyses.
* `run_stata_regressions`: Runs the two regression specifications that were better suited for Stata: double lasso and randomization inference. 
* `run_stata_dosage_model`: Builds a model predicting dosage levels among the treatment group using baseline criminal history and incarceration measures. 
* `compute_multiple_hypothesis_adjustment`: Computes multiple hypothesis adjusted p-values using output from the `run_regressions` task.  

Table and figure creation

* `generate_tables_and_figures`: Formats the output of the analysis tasks described above into the tables and figures seen in the paper. A mapping of scripts to specific tables and figures can be found below. 

The main orchestration script for each task folder can be found in the `src` subfolder. Many of the task folders also contain a `hand` subfolder with task-specific parameters and an `R` subfolder with task-specific helper functions. Functions and parameters that were utilized across multiple tasks can be found in `R/project_functions.R` and `hand/project_config.yaml`, respectively.

To ensure reproducibility, random seeds are set in the following locations:

* Line 165, `run_regressions/src/run_regressions.R`
* Lines 34 and 83, `run_stata_regressions/src/run_stata_RI_regressions.do`
* Line 39, `run_stata_dosage_model/src/run_stata_dosage_model.do`


This code is licensed under a GPLv3 license. See LICENSE for details. The replication package contains a lightly modified version of the [ESTRAT Stata module](https://ideas.repec.org/c/boc/bocode/s457801.html), which can be found at `run_stata_dosage_model/src/estrat_dose.do`.


## List of tables and figures

The provided code reproduces all tables and quantitative figures in the paper, as detailed below.

| Figure/Table #   | Script (in `generate_tables_and_figures` subfolder)                             | Output file                      
|------------------|---------------------------------------------------------------------------------|-------------------------------------------------
| Table I          | gen_baseline_outputs/src/gen_baseline_outputs.Rmd                               | baselines_primary_up_to_date.tex
| Table II         | gen_realized_risk_table/src/gen_realized_risk_table.Rmd                         | realized_risk_up_to_date.tex
| Table III        | gen_payroll_outputs/src/print_payroll_tables.Rmd                                | wage_table_up_to_date.tex
| Table IV         | gen_primary_outcomes_table/src/gen_primary_outcomes_table.Rmd                   | outcomes_primary_up_to_date.tex
| Table V          | gen_outcome_by_subgroup_outputs/src/gen_outcome_by_subgroup_outputs.Rmd         | outcomes_subgroup_pathway_up_to_date.tex
| Table VI         | gen_outcome_by_subgroup_outputs/src/gen_outcome_by_subgroup_outputs.Rmd         | outcomes_subgroup_risk_up_to_date.tex
| Table VII        | gen_benefit_cost_outputs/src/gen_benefit_cost_outputs.Rmd                       | outcomes_benefit_cost_up_to_date.tex
| Figure I         | gen_shoot_homs_per_capita/gen_shoot_homs_per_capita/src/gen_shoot_hom_plot.R    | top_shooting_neighborhood_plot_up_to_date.png
| Figure II        | gen_retention_plots/src/03_plot_retention.R                                     | retention_precovid_combined_up_to_date.jpg
| Figure III       | gen_baseline_outputsgen_pathway_risk_plots/src/gen_pathway_risk_plots.R         | main_pathway_baseline_risk_up_to_date.pdf
| Table A.I        | gen_baseline_outputs/src/gen_baseline_outputs.Rmd                               | baselines_pathway_up_to_date.tex
| Table A.II       | gen_payroll_outputs/src/print_payroll_tables.Rmd                                | pct_hours_up_to_date.tex
| Table A.IV       | gen_payroll_outputs/src/print_payroll_tables.Rmd                                | wage_table_appendix_up_to_date.texgit 
| Table A.V        | gen_alternate_spec_outputs/src/gen_alternate_spec_outputs.Rmd                   | other_specifications_up_to_date.tex
| Table A.VI       | gen_RI_estimate_table/src/gen_RI_table.Rmd                                      | randomization_inference_up_to_date.tex
| Table A.VII      | gen_incapacitation_scaled_table/src/gen_incapacitation_scaled_table.Rmd         | incapacitation_scaled_rate_up_to_date.tex
| Table A.VII      | gen_covid_outputs/src/gen_covid_outputs.Rmd                                     | covid_models_with_means_up_to_date.tex
| Table A.VIII     | gen_time_of_day_outputs/src/print_time_of_day_outcomes.Rmd                      | time_of_day_results_wide_up_to_date.tex
| Table A.IX       | gen_separated_outcomes_table/src/gen_separated_outcomes_table.Rmd               | outcomes_primary_separated_up_to_date.tex
| Table A.X        | gen_other_outcomes_table/src/gen_other_outcomes_table.Rmd                       | outcomes_other_up_to_date.tex
| Table A.XI       | gen_time_of_day_outputs/src/print_time_of_day_outcomes.Rmd                      | time_of_day_results_secondary_up_to_date.tex
| Table A.XII      | gen_outcome_by_subgroup_outputs/src/gen_outcome_by_subgroup_outputs.Rmd         | outcomes_subgroup_neighborhood_up_to_date.tex
| Table A.XIII     | gen_outcome_by_subgroup_outputs/src/gen_outcome_by_subgroup_outputs.Rmd         | outcomes_subgroup_age_up_to_date.tex
| Table A.XIV      | gen_outreach_model_outputs/src/gen_outreach_model_outputs.Rmd                   | outreach_model_up_to_date.tex
| Table A.XV       | gen_pathway_reweighted_table/src/print_ao_table.Rmd                             | andrews_oster_table_up_to_date.tex
| Table A.XVI      | gen_takeup_model_outputs/src/gen_takeup_model_outputs.Rmd                       | takeup_model_up_to_date.tex
| Table A.XVII     | gen_dosage_outputs/src/dosage_heterogeneity_output.do                           | predicted_dosage_up_to_date.tex
| Table A.XVIII    | gen_benefit_cost_appendix_outputs/src/gen_benefit_cost_appendix_outputs.Rmd     | appendix_benefit_cost_up_to_date.tex
| Table A.XIX      | gen_prog_component_importance_table/src/gen_prog_component_importance_table.Rmd | program_component_importance_up_to_date.tex
| Figure A.I       | N/A (non-quantitaive)                                                           | N/A (non-quantitaive)
| Figure A.II      | gen_payroll_outputs/src/01_wage_growth_plot.R                                   | wage_growth_pathway_up_to_date.png
| Figure A.III     | gen_retention_plots/src/03_plot_retention.R                                     | retention_fullperiod_combined_up_to_date.jpg
| Figure A.IV      | gen_dosage_outputs/src/dosage_heterogeneity_calibration.do                      | dosage_calib_plot_up_to_date.pdf
| Figure A.V       | gen_time_varying_itt_outputs/src/gen_time_varying_itt_outputs.Rmd               | time_varying_itt_plot_up_to_date.png
| Figure A.VI      | gen_pathway_risk_plots/src/gen_pathway_risk_plots.R                             | appendix_pathway_baseline_risk_up_to_date.pdf

Note, in addition to generating outputs with names ending with `"up_to_date"`, each script outputs a version ending in the timestamp at runtime.



## Instructions to replicators

- Edit `config.mk` to update the filepaths to fit your machine
- Run the code: 
  - To run all code from start to finish, navigate to the `readi_evaluation_replication` folder in a bash terminal and enter `make`
  - To run only the code in a specific folder, navigate to that folder in a bash terminal and enter `make`


## Computational requirements

### Software Requirements

- R (3.6.3)
- Stata (17.0)

The latest version of each R and Stata package the code relies on will be installed automatically as needed.

The codebase relies on GNU Makefiles to automate the running of the each script in sequence. Use of this functionality requires access to a bash terminal, like the one available on Linux machines.

### Memory and Runtime Requirements

The code was run on a large shared machine running Ubuntu 22.04 with 30 Intel(R) Xeon(R) CPUs and 512 gigabytes of RAM. Using a small portion of the available resources, the total runtime of the code was ~24 hours. The vast majority of this runtime can be attributed to two computationally intensive tasks: 1) computing family-wise error rates by permuting treatment and running OLS for 25,000 iterations, and 2) building a dosage prediction model.




``` {=html}
<style>
body { min-width: 80% !important; }
</style>
```
